local client
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia (0.04)
- Asia > Singapore (0.04)
- (4 more...)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States (0.04)
- Asia > China > Beijing > Beijing (0.04)
Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed Initialization
Federated learning (FL) is a distributed paradigm that coordinates massive local clients to collaboratively train a global model via stage-wise local training processes on the heterogeneous dataset. Previous works have implicitly studied that FL suffers from the client-drift problem, which is caused by the inconsistent optimum across local clients. However, till now it still lacks solid theoretical analysis to explain the impact of this local inconsistency. To alleviate the negative impact of the client drift and explore its substance in FL, in this paper, we first design an efficient FL algorithm FedInit, which allows employing the personalized relaxed initialization state at the beginning of each local training stage.
Addressing Algorithmic Disparity and Performance Inconsistency in Federated Learning
Federated learning (FL) has gain growing interests for its capability of learning from distributed data sources collectively without the need of accessing the raw data samples across different sources. So far FL research has mostly focused on improving the performance, how the algorithmic disparity will be impacted for the model learned from FL and the impact of algorithmic disparity on the utility inconsistency are largely unexplored. In this paper, we propose an FL framework to jointly consider performance consistency and algorithmic fairness across different local clients (data sources). We derive our framework from a constrained multi-objective optimization perspective, in which we learn a model satisfying fairness constraints on all clients with consistent performance. Specifically, we treat the algorithm prediction loss at each local client as an objective and maximize the worst-performing client with fairness constraints through optimizing a surrogate maximum function with all objectives involved. A gradient-based procedure is employed to achieve the Pareto optimality of this optimization problem. Theoretical analysis is provided to prove that our method can converge to a Pareto solution that achieves the min-max performance with fairness constraints on all clients. Comprehensive experiments on synthetic and real-world datasets demonstrate the superiority that our approach over baselines and its effectiveness in achieving both fairness and consistency across all local clients.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia (0.04)
- Asia > Singapore (0.04)
- (5 more...)
Traceable Black-box Watermarks for Federated Learning
Xu, Jiahao, Hu, Rui, Kotevska, Olivera, Zhang, Zikai
Due to the distributed nature of Federated Learning (FL) systems, each local client has access to the global model, posing a critical risk of model leakage. Existing works have explored injecting watermarks into local models to enable intellectual property protection. However, these methods either focus on non-traceable watermarks or traceable but white-box watermarks. We identify a gap in the literature regarding the formal definition of traceable black-box watermarking and the formulation of the problem of injecting such watermarks into FL systems. In this work, we first formalize the problem of injecting traceable black-box watermarks into FL. Based on the problem, we propose a novel server-side watermarking method, $\mathbf{TraMark}$, which creates a traceable watermarked model for each client, enabling verification of model leakage in black-box settings. To achieve this, $\mathbf{TraMark}$ partitions the model parameter space into two distinct regions: the main task region and the watermarking region. Subsequently, a personalized global model is constructed for each client by aggregating only the main task region while preserving the watermarking region. Each model then learns a unique watermark exclusively within the watermarking region using a distinct watermark dataset before being sent back to the local client. Extensive results across various FL systems demonstrate that $\mathbf{TraMark}$ ensures the traceability of all watermarked models while preserving their main task performance.
AT able of Notations Table 3: Table of Notations throughout the paper.Indices: c,c 1 index for classes (c P t 1,, C u " r C s) i index for data (i P t 1,,N u " r N s) k,k
The softened softmax probability calculated without true-class logit on the server/client modelClass Distribution on Datasets: p " r p For the federated learning situation, we calculate this measure on the server model. Here we provide details of our experimental setups. Multi-GPU training is not conducted in the paper experiments. The details about each datasets and setups are described in Table 4. CIFAR-100, we add Cutout [12] augmentation. Details datasets setups used in the experiment. We use a momentum SGD optimizer with an initial learning rate of 0.01, and the momentum is set as The learning rate is decayed with a factor of 0.99 at each round, and In the motivational experiment in Section 3, we fix the learning rate as 0.01. Since we assume a synchronized federated learning scenario, parallel distributed learning is simulated by sequentially training the sampled clients and then aggregating them as a global model. For the implemented algorithms, we search hyperparameters and choose the best among the candidates. The hyperparameters for each algorithm is in Table 5. Sharding strategy, and the size of local datasets is identical. The conceptual illustration of federated distillation methods is in Figure 9. On the other hand, our proposed FedNTD does not have such constraints (Figure 9c). Additional resource requirements compared to FedAvg.Method No Additional Requirements on: Statefulness? We extend the motivational experiment in Section 3.1 to the main experimental setups. The value in the parenthesis is the forgetting F . The value in the parenthesis is the forgetting F . The value in the parenthesis is the forgetting F . The value in the parenthesis is the forgetting F . We report an additional experiment on popular architecture, ResNet-10. ResNet-10 is about 10x larger than the 2-conv + 2-fc model for the main experiments. The result is plotted in Figure 11. Here we investigate the personalized performance of our FedNTD. The results are in Table 13 and Figure 13 shows their corresponding learning curves. FedNTD consistently improves the performance even in such cases.Figure 13: Learning curves that corresponds to Table 13. The introduced loss term of FedAlign aims to seek out-of-distribution generality w.r.t. Figure 14: Loss space of learned model (Client 16 / LDA α " 0. 5).
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)